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Abstract 



This paper proves that there does not exist a polynomial-time 
algorithm to the the subset sum problem. As this problem is in 
NP, the result implies that the class P of problems admitting 
polynomial-time algorithms does not equal the class NP of prob- 
lems admitting nondeterministic polynomial-time algorithms. 

Keywords: computational complexity, polynomial-time, algo- 
rithm, knapsack problem 



2 



1 Introduction 



The paper shows that there does not exist polynomial-time algo- 
rithms for the subset sum problem, called the knapsack problem 
in the Merkle-Hellman cryptosystem. We start by defining the 
formulations used in this paper. Let IN and IR indicate natural 
and real numbers respectively. 

Definition 1. A knapsack is a pair of the form (j, (di, . . . , dn)) 
where j, n G IN, j, n > and dk dk > for 1 < k < n. 

The knapsack problem means the following: given a knapsack 
{j, {di, . . . ,dn)) determine if there exist binary numbers Ck G 
{0, 1}, 1 < /c < n, such that 

n 
k=l 

Let B . a E JR, B > 1, a > he fixed numbers. An algorithm A 
is called polynomial-time algorithm to the knapsack problem if 
there exist numbers C, ^ G IR that depend on B and a but not 
on n such that the following condition is true: For any sequence 
of knapsacks of the form 

{{jn, {di^n, ■ ■ ■ 1 dn,n)))n>l 

satisfying 

log2jn < \og^dk,n < Bn!" , {l<k< n), (n > 1) (LI) 

the number Nn of elementary operations that the algorithm A 
needs to produce an answer yes or no to the question if there 
exists binary numbers Ck,n ^{0,1}, l<A;<n, such that 

n 

jn = Yl Ck,ndk,n (1-2) 
k=l 

satisfies N„ < Cn^ for all n > 1. 
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Remark 1. In the definition of a polynomial-time algorithm for 
the knapsack problem we have included an upper bound on jn 
and on each dk,n, 1 < k < n. Such bounds are necessary for the 
following reason. The number m of bits in the binary represen- 
tation of jn satisfies m < \0g2jn < m-\- 1. Thus, if log2 jn grows 
faster than any polynomial as a function of n then so does the 
length of jn in the binary representation. It is necessary to verify 
that (1.2) is satisfied. It requires making some operations (like 
compare, copy, read, add, subtract, multiply, divide, modulus) 
that act on a representation of jn on some base number. We may 
assume that the number base is 2 as changing a number base does 
not change the character of the algorithm from polynomial-time 
to non-polynomial-time. Any operations that require all bits of 
jn must require more than a polynomial number of elementary 
operations from any algorithm A if the number of bits in jn grows 
faster than any polynomial. Similar comments apply to dk,n- 

Remark 2. The problem that has been described is used in 
the Merkle-Hellman knapsack cryptosystem and today it is com- 
monly known as the knapsack problem. The name Subset sum 
problem is used for it in [2] p. 301, while the name Knapsack 
problem is reserved for a more general problem involving select- 
ing objects with weights and profits. The name knapsack is more 
convenient than subset sum and it is ofen used in this paper. 



2 The method of the proof 

In this article we give a proof that the subset sum problem cannot 
be solved in polynomial time. The result settles the P versus NP 
problem [1]. It is understandable to be sceptical of proofs purpot- 
ing to solve well-known problems but sometimes the proofs are 
correct. The Poincare conjecture was proven some years ago and 
the Statement D of the Clay Mathematics Institute Navier-Stokes 
problem was against expectations proven by the author and the 
proof was recently published in a peer-reviewed journal [3]. Thus, 
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oversceptism is not always good either. There are rumored to be 
thousands of so called crank proofs of the P versus NP problem. 
This is very much an overstatement. The list P versus NP page 
kept by G. J. Woeginger at www.win.tur.nl lists 59 attempts by 
3. August 2010. In September 2008, the time the present proof 
was put to arxiv, there were 44 attempts. It would not be a ma- 
jor effort for the mathematical community to check all of these 
59 proofs because many are not serious and most have been or 
can be easily shown incorrect. The presented proof was over 15 
months in a peer-reviewed journal and in four referee statements 
only a few very minor mistakes of the type of misprints were 
found. However, the editor did not see the possibilities of finding 
a referee who would read the manuscript to the end. This was 
probably because the manuscript was not well structured and 
the proof was not broken into small lemmas. In this new version 
the logical structure of the proof is improved while no new argu- 
ments have been added. As no errors were found in the reviews, 
there are no corrections in the proof. Much discussion is added to 
address the issues where the previous referee thought there was 
some unclarity. These discussion parts and remarks make the ar- 
ticle longer but they should not be removed before the method of 
the proof is correctly understood. The main points of unclarity 
that a previous referee stated were two. There was a question 
if Section 3 makes assumptions on the way the algorithm works 
and the other one was if a lower bound proof of Section 4 can 
be made without specifying a generic model of computation. The 
added discussion hopefully shows that these issues are correctly 
treated in the proof. 

Before going to the proof let us look at a simple algorithm that 
demonstrates that the problem in finding a polynomial time algo- 
rithm to the subset sum problem is caused by the sum in (1.2) 
having a bound that grows faster than any polynomial, as indi- 
cated in (1.1). If the upper bound of j grows polynomially there 
exists a polynomial time algorithm to the subset sum problem. 
The algorithm in Lemma 2 is very slow but it runs in polynomial 
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time. It calculates an exponentially growing number of combi- 
nations of Cfc in the same polynomial time run. The algorithm 
in Lemma 2 is not a practical competitor to existing algorithms 
for solving the subset sum problem. Effective algorithms exist for 
many, or maybe even for almost all, cases of the subset sum prob- 
lem. Effective algorithms usually do not compute an exponential 
number of combinations of values Ck at the same run. They limit 
the search to some subtrees. 

Lemma 1. Let B > 1, a > and 7 > be selected. Let Vn > 
and jn be integers satisfying 

Vn < n^, log2 jn < Bfi'' (n > 1). 

There exist numbers C, /3 G JR,C > 1, /3 > and an algorithm 
that given any sequence of knapsacks 

{{jn, {di^n, • • • 5 dn,n)))n>l 

can determine for each n if there exist binary numbers Cf^^nj 

1 < 

k < n, such that 

n 

jn ^ J2 Ck,ndk,n (modr^). (2.1) 

k=l 

The number Nn of elementary operations needed by the algorithm 
satishes Nn < Cn^ for every n > 1. 

Proof. The bound on the logarithm of jn guarantees that modular 
arithmetic operations on dk^n can be made in polynomial time 
since we can assume that dk,n ^ jn- We can find the numbers Ck^n 
by computing numbers Sk,j,n from the recursion equations for k 

Sk,j,n = Sk-l,j,n + ^k-l,{j-dk,n){mod r„),n (^•^) 
S0,j,n = Sj=0, 

where the index j ranges from to r„ — 1 and is calculated modulo 
r„. The index n is fixed and only indicates that the numbers are 
for the n*^ knapsack. Here 6x is an indicator function: 6x = 1 H 
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the statement x ( i.e., j equals in (2.2) ) is true and Sx = if x 
is false. Let 

j=0 

where \x\ < 1. From (2.2) follows 



x^ . 



Yl Sk,j,nX^ = Sk-l,j,nX^ + ^A;-l,(j-4 n)(mod r„),n 
j=0 j=0 j=0 

Changing summation to / = j — dk,n yields 

Gk,n{x) = Gk-l,n{x) + Y j'(mod r„),n^'^ 

j'=-dk,n 

Changing the order of summation of j' shows that 

Gk,n[x) = Gk-l,nix) + X'^''" Y Sk-l,j',nX^'. (2.3) 

j'=0 

Simplifying (2.3) gives 

Gk,n{^) = Gk-l,n{x)+x'^''-"Gk-l,n{x). 

As Go^nix) = so,o,n = 1, we get 

n 

GnA^)= n(i+^'''")- 

k=l 

Expanding the product shows that Sk^j^n 7^ if and only if there 
exist binary numbers c^, G {0,1}, 1 < m < n, satisfying 

n 

j = Y Cmdm,n (modr„). 

m=l 

For j = jn and k = n we get the knapsack problem. This means 
that we can solve the knapsack problem by computing all Skj^n 
form (2.2). We do not actually need the numbers Sk,j,n but only 
the information if Skj,n 7^ 0. Therefore we will not compute the 
terms Skj,n directly but calculate binary numbers bj^k G {0, 1} by 
Algorithm AO below. The number bkj calculated by AO is zero if 
and only if the number Sk,j,n = is zero. 
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Algorithm AO: 

Loop from A; = to A; = n with the step k := k -\- 1 do { 
Loop from j = to j = r^^ — 1 with the step j := j ' + 1 do 
bj,k ■■= 

} 

&o,o := 1 

Loop from k = 1 to k = n with the step A; := A; + 1 do { 
M := mm{rn - 1, E^=i dm,n} 

Loop from j = to j = M with the step j := j + 1 do { 

If (bk-ij = and &fc_i,(,_rf,„)(modr„) = 0) bj,k := 
else do bj^k '■= 1 

} 

} 

If b„j =1 do result := TRUE else do result := FALSE 

Algorithm AO loops from A; = to A: = n and from j = to j = 
rn — 1 < rC Thus AO needs a polynomial number of elementary 
operations as a function of n in order to give the result TRUE 
or FALSE to the existence of a solution to (2.1). 

Lemma 2. Let 5,aGlR, 5> 1, a>Q he fixed. There exist 
numbers C, /3eIR, C>l,/3>0 and an algorithm that for any 
sequence 

{{jn, {dl,n, ■ ■ ■ , dn,n)))n>l 

of knapsacks satisfying 

jn < Bn"^, dk,n <jn (1 < ^ < n), 



8 



can determine if there exist binary numbers Ck,n, 1 < k < n, such 
that 

n 
k=l 

The number of elementary operations needed by the algorithm 
satishes Nn < Cn^ for every n > 1. 

Proof. The result follows directly from Lemma 1 by selecting = 
Sfc=l dk,n ^ ^in- 

Remark 3. Lemma 2 solves all possible values of jn < Bn^ 
with the same polynomial time run of Algorithm AO because 
jn is not used in AO before checking the final result hn,k- Let 
us consider the case when jn is not limited from above by a 
polynomial of n. Lemma 1 runs in polynomial time even if the 
upper bound for jn grows faster than a polynomial of n but it 
does not produce results that can tell if there exists a solution for 
a particular value jn- A polynomial-time test, such as taking a 
modulus in (2.1), maps the superpolynomial set of possible values 
of j = E^=i Ck,ndk,n into a polynomial number of classes. In (2.1) 
the classes are all sums j with the same moduli by r„. At least 
one such a class corresponds to an superpolynomial number of 
values j. In order to check if any value j in the class equals jn the 
algorithm should in some way check all of the values j in the class 
but if the algorithm at the same run checks all values of j then it 
apparently should loop over a superpolynomial set which is not 
possible for a polynomial time algorithm. In general, we can say 
that a single polynomial time run of an algorithm cannot solve 
all values of jn that are below a superpolynomial upper bound 
because the algorithm can only produce a polynomial number 
of results and there exist a superpolynomial number of possible 
values jn- A polynomial time algorithm that solves the subset 
sum problem for any value jn below a superpolynomial upper 
bound must limit search and there must be values jn that are 
solved with different runs of the algorithm. 
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Remark 4. An algorithm is a finite set of rules that at every 
step tell what to do next. We can implement an algorithm as 
a computer program in a second generation language on a von 
Neumann machine and a polynomial time algorithm can be im- 
plemented in this way so that it requires time and memory that 
grow polynomially with respect to the problem dimension. In the 
case when the smallest upper bound of jn in Remark 3 grows ex- 
ponentially a program in a second generation computer language 
implementing a polynomial time algorithm needs to limit search 
by branching instructons. Thus, we can find values of jn such 
that the algorithm uses different branches in solving the subset 
sum problem. 

Remark 5. It does not seem possible to select a sequence of 
specific subset sum problems and to show that no algorithm can 
solve this specific sequence of problems in polynomial time. This 
is so because it is seems plausable that we can create an algo- 
rithm that treats these specific problems in a particular way and 
can solve that sequence of problem in a fast way. Instead, we 
must first select the algorithm and pose that selected algorithm 
a sequence of subset sum problems that are particularily hard 
for that specific algorithm. As the algorithm can be any possi- 
ble algorithm, the sequence of problems can only be defined by 
using some suitable definition of a difficult problem to the se- 
lected algorithm and we cannot give any numerical values for all 
of the numbers Ck^n in (1-2). We will do the selection by using 
the following definition of the computation time of a subset sum 
problem. 

For convenience, let us select n to be of the form n = 2*+^ for 
some i > 0. This simplifies expressions since it is not necessary 
to truncate numbers to integers. 

Definition 2. 

We define a function /(n) that describes (in a certain sense) the 
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worst computation time for a selected algorithm. 

Let the worst in the median n-tuple as be as follows. Let 

l,n? • • • ? ^n,n? jn) 

be the computation time for deciding if the knapsack 

has a solution or not. Let 

Mediarij^ /i(rfi,n, • • • , 4,n, jn) (2.4) 
be the median computation time where jn ranges over numbers 

i,G{C + l,..., 2^+^-1} (2.5) 
satisfying the two conditions 

> 2^+2 (2.6) 

where C = 2^+^, and that there is no solution to the knapsack 
(jn, {di,n, • • • , dn^n))- The valucs of jn are computed separately in 
calculation of the median, i.e., no partial results from previously 
computed values of jn are used. 

Let {di^n, ■ ■ ■ , dn,n) range over all knapsack sequences with 

n 

[log2 E dk,n\ = n 

k=l 

and dk^n < ^^r^- Because of this requirement at most every second 
value of jn in (2.5) is a solution to the knapsack, i.e., there are 
2" combinations of (ci^^, . . . , Cn^n) mapped to numbers from zero 
to 2^^^ — 1. The worst in the median tuple for n is an n-tuple 
{di^n, • • • , dn^n) (possibly not unique) that maximizes the median 
computation time (2.4). 

Let this maximal median computation time be denoted by /(n). 



Jn,l Jr 



c 



Jn 

c 
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Thus 



f{n) = max Median /i(c^i,n, • • • , dn,njn)- (2.7) 



Lemma 3. Let m be fixed and n be a power of m. If f{n) 
satisfies tlie inequality 



-f (-) < J(n) (2.8) 
then f{n) does not grow polynomially witli n. 



Proof. Iterating we get 

n n „ / n 



mm? 



f (^) < 



and iterating up to k yields 

m^ 

i.e., 



^/ (^) < fin) 
=i« Km'^J 



gfclnn-ifc^lnm-| Inm .f ( ^ 



Setting A; = ^ gives 



/ (^) < /w- 



If m is any fixed number we see that f{n) satisfying (2.8) is not 
bounded by a polynomial function of n. 

Lemma 4. Let n be a power of 2. If f{n) = fi{n) + f2{n) 
wliere fi{n) is a polynomial function of n and f2{n) satisfies the 
inequality 

|/2 (fj < h(n) (2.9) 
then f{n) does not grow polynomially with n. 



Proof. If f{n) is a polynomial function of n and since fi{n) is 
a polynomial function of n by assumption, it follows that /2(^) 
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must also be a polynomial function of n. By Lemma 3, f2in) is 
not a polynomial function of n, thus neither is /(n). 

Remark 6. We use the median in Definition 2 instead of the 
worst case or the worst in the average case because we need | 
almost as long computations as the worst in (2.9). In the worst 
and in the worst in the average, a very slow computation of one 
value jn can be the reason for the long computation time. We 
include only unsuccessful cases of jn in the computation of the 
median because then there is no need to argue that the median 
over unsuccessful jn is at least as high as the median over all jn- 

3 Construction of a special subset sum problem 

We will use the denotation ni = | throughout this article for 
brevity. In this section we will define a special subset sum problem 
Kij^ in Definition 3 and show that it can only be solved by solving 
Til subknapsacks (j-, {di^n, ■ ■ ■ , <^ni,n)) with different values of j-. 

Definition 3. Construction of We first make a knapsack 
where the only solutions must satisfy the condition that exactly 
one Cfc must be 1 and the others must be zero for = ni + 1 to 
k = n. Let us construct the values dk,n, = ni + 1, . . . , n of Kij^ 
for a given Let C = 2^+^ and 

? jn,l — jn jn,h i.^-^) 

be the high and low bit parts of jn- Because of (2.5), jn,h 0- 
Let 

dni+k,n = jn,h + Clk (3.2) 

where < < min{jVi,;, ^"^~^ } are distinct integers and there 
exists no solution to the knapsack problem for the knapsack 

(ii? idl,nj - - - 1 dni,n)) 

where 

j'i = jn,l - CLi- (3.3) 



jn,h — C 



Jn 

c 
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Let us also require that the computation time for is at least as 
long as the median computation time /(ni) for {j, {di^n, ■ ■ ■ , dni,n))- 
We can select j • filling this condition because half of the values j 
are above the median. Notice that we compute the median only 
over values j that do not give a solution to the knapsack. We will 
also assume that the are in the set corresponding to (2.5)-(2.6) 
for /(m), i.e., 

j;'g{C" + 1,..., 2-^+1-1} (3.4) 
satisfying the condition 

> 2t+2 (3.5) 

^1 I 1 

where C = 2~+ . We may assume so because there are enough 
values from which to choose j-. 

Remark 7. On (3.2) we select the numbers in such a way that 
the (ini+fc,n satisfy the size condition dn^+k,n < Because of 

the bound (2.6) we have an exponential number of choices for a^. 
It is possible to find numbers j' such that there is no solution since 
only for about half of the values of j there exists a solution for 
(j, {di^n^ • • • 5 c^ni,n))- If jn,i is too Small and we cannot find values 
Jl, we take a carry from j„ in (3.3) and reselect a^. Because of 
the lower bound on j in (2.5), jn^h is not zero and we can take 
the carry. Then jn,h is decreased by the carry. 

Remark 8. Exactly one Ck must be 1 and the others must be zero 
for A; = ni + 1 to A; = n. There cannot be more values = 1 for 
k > ni because then the higher bits of jn are not matched. The 
unknown algorithm can try also other combinations but these are 
the only possible combinations and the algorithm must also try 
them (i.e., check these cases in some way unknown to us). The 
sum of the numbers dk,n: k < ^ is less than 2?+^ — 1. Adding 
one Ck can give a carry and there may not be a solution to the 
knapsack because the high bits of jn do not match but this is not 
an issue since we do not want solutions. We select the n-tuple so 
that there are no solutions to the knapsack already because the 
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lower bits do not match. 

Lemma 5. The algorithm cannot stop to finding a solution 
because for every jn none of the | values of j[ solve the knapsack 
problem. Every value j- gives at least as long computation as the 
median computation time f{ni). 

Proof. We have selected Kij^ such that (j-, {di^n, ■ ■ ■ , dn^,n)) has 
no solution for any j-. Thus the algorithm cannot stop because it 
finds a solution. By construction the values j- give at least as long 
computation time as the median for the tuple at k = 1, . . . ,ni. 
Since that tuple is the worst in the median tuple for ni, the 
computation time for each is at least /(ni). 

Lemma 6. There is no way to discard any values j[ without 
checking if they solve the subknapsack from k = 1 to k = rii. 
Any case of using the values of dk,n in order to get the result is 
considered checking. 

Proof. We can select any at in such a way that there either exists 
a solution or does not exist. Knowledge from other Cj^„ {i ^ ni+k) 
cannot give any information on how this Ok was selected. Thus, 
the existence of a solution must be checked using the value dn^+k,n- 

Lemma 7. Several values of j[ cannot he evaluated on the same 
run. The median computation time of Kij^ is at least 

/i(ni) + ni/2(ni) 

where f{n) = flip) -\- f^ip) and flip) is a polynomial function of 
n. 

Proof. As explained in Remark 3, a polynomial time algorithm 
cannot solve all values of j- at the same run because it would 
require an exponential amount of memory. As explained in Re- 
mark 4, we can assume that the algorithm is implemented in a 
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second generation computer language on a von Neumann ma- 
chine and its code has branching instructions. These branching 
instructions define a branching tree describing the execution of 
the algorithm for any input data. The tree is fixed when the al- 
gorithm is selected. At each branching point the input data is 
divided into a finite number of classes. Because this division is 
fixed, we can always find two values j- which are not executed 
by the same polynomial time run. After finding two, we can con- 
tinue to find three values j'^ which all are executed by different 
polynomial time runs of the algorithm. This can be extended to | 
values jj. Especially, it is not possible for such an implementation 
of an algorithm to have the property that for any selection of | 
values jl there always exists a polynomial time run such that the 
subset sum problem for every value j- is computed in the same 
run. Instead, we can select ji- in such a way that no two values 
jl are computed in the same run. The runs in Lemma 6 do not 
need to be completely separate but they can have parts that are 
shared, as long as the shared parts are computed in polynomial 
time. This is necessarily the case, the runs must share at least 
the beginning of the code before branch instructions are reached. 
The shared part can be described by a polynomial function /i(n) 
and the computation time to solve all these problems is as in in 
Lemma 4. 

Discussion of the method of Section 3 

We explain why the method of Section 3 does not make assump- 
tions on the way the algorithm solves the subset sum problem. 
The method is based on the following logical division: 

CI) There can be that a subtree of possible solutions can 

be discarded without checking the subproblems in the subtree. 

C2) If a subtree cannot be discarded, all its subproblems must 
be checked, but several subproblems may be checked by the same 
run of the algorithm. 

C3) If a subtree cannot be discarded, all its subproblems must 
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be checked. If several cannot be checked by the same run of the 
algorithm then they must be checked separately. 

There is an analogy between checking a proof by a proof checking 
algorithm and checking a subset sum problem by a subset sum 
problem checking algorithm. In both cases there is an outcome 
that stops the search (finding an error, finding a solution) and 
an outcome that does not stop the seach (not finding an error, 
not finding a solution). Let a proof have rii lemmas and the main 
theorem be that all lemmas are correct. If the proof checking 
algorithm concludes that the main theorem does not have an 
error, it must have checked all lemmas. If it finds an error, it 
does not need to check all lemmas as it stops to an error. 

The proof checking algorithm can discard a set of lemmas with- 
out checking if it can reason that this set is impossible. This 
corresponds to case (CI). We assume that the main theorem is 
correct, i.e., no lemmas have errors. As there are no errors the 
theorem checking algorithm cannot discard lemmas and it must 
check them, i.e., verify if they have an error or not. In the pre- 
sented proof of the subset sum we construct subset sum problems 
that do not have any solutions, corresponding to the case of a 
theorem with no errors. 

This checking does not need to be done by reading each lemma 
in any particular order, nor do we assume any such order. Several 
lemmas may be checked at the same time by some argument, as 
in case (C2). The proof that the first lemma is correct can e.g. be 
valid for all lemmas satisfying some properties and in this case the 
proof can verify a set of lemmas. Let us assume that we know the 
finite code of the proof checking algorithm and select the lemmas 
depending on the algorithm. In the case of lemmas that can be 
over anything that is possible to imagine it is easy to believe 
that we indeed can select a set of ni unrelated lemmas that must 
be checked with a different run each by this special previously 
selected algorithm. In the case of the subset sum problem we 
show that this is the case in Lemma 7. Assuming that we got 
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ni lemmas such that each must be solved by a different run by 
the selected algorithm, the running times for the lemmas are 
essentially additive - there may be some common parts in the 
runs but the running times add as in Lemma 4, which contains 
an additive term given by a polynomial function /i(ni) to account 
for the shared parts of the runs. 

This method does not make any assumptions of the algorithm. 
It needs the following properties of an algorithm: the algorithm 
solves the problem correctly and it has a finite set of instructions. 
The algorithm runs in polynomial time. It needs to be deter- 
ministic because with a subroutine capable of guessing correctly 
the algorithm could check all lemmas on the same run. A non- 
deterministic algorithm could also check explicitly more than a 
polynomial number of lemmas in the same run by guessing which 
need to be checked. Additionally, it is needed that the space of 
possible lemmas has a superpolynomial size, so that we can find 
the Til lemmas. This is a demand on the problem: not all prob- 
lems have this property. The method does not restrict the proof 
checking algorithm to read the lemmas as they are written. It can 
work any way. This logical division has nothing to do with any 
special way of computing subset sums. The logical division can be 
applied to any similar problem and it does not make restrictive 
assumptions of the algorithm. 

Let us look at some concrete examples explaining why it is pos- 
sible to construct a subset sum problem containing parts that 
must be all checked. 

Let jn = 21 -\- 2^° and let the n-tuple be 

(di,„...,4„n,9 + 2io,7 + 2i",..,). 

We only look at the two subproblems: = 21 — ai = 21 — 9 = 12, 
^2 = 21 — a2 = 21 — 7 = 14. We assume that the carry from the 
sum 

Til n 
k=l fc=ni+l 
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cannot reach 2^ and therefore exactly one Ck,n = 1 when k > ni. 
The other c^n = for A; > ni. 

Example 1. We ask do we need to solve the problems 

k=l 

111 

14 = Ck,ndk,n 

k=l 

before concluding that 

n 

20 + 2 = J2 Ck,ndk,n 
k=l 

does not have a solution. 

Let us assume that all dk,n, k = 1, . . . , ni, are even. We can use 
a divisability condition and conclude that 20 cannot be made 
from a subset sum of even dk,n and one odd number. We do not 
need to check if 12 and 14 can be made as a subset sum of the 
smaller knapsack up to ni, while they probably can. This is case 
(CI) in the logical division: we can discard a whole subtree with- 
out checking the subproblems. We do not have this possibility 
in Lemma 6: is the lower bit part of dn^+i^n- This is why we 
cannot discard the subproblems without checking. They are all 
possible. So, we conclude that the subproblems in Lemma 6 must 
be checked. They may be checked at the same time by the same 
run of the algorithm and we must go to (C2). 

Example 2. Let us take another example. Let j'l = 21 — ai = 
21 — 8 = 13, ^2 = 21 — ^2 = 21 — 6 = 15. Let us assume that all 
dk,n, k = 1, . . . ,ni, are even. There are two subproblems: 

ni 

13 = X/ Ck,ndk,n 
k=l 

ni 

15 = X] '^k,ndk,n- 

k=l 
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We ask do we have to check these subproblems before concluding 
that 

n 

21+2 = ^ Ck,ndk,n 
k=l 

does not have a solution. 

We must check both of these subproblems since they are possible, 
but we can check both of them at the same time by computing 
jmod 2 = 1 and noticing that there cannot be a solution to 

n 

21 + 2 = J2 Ck,ndk,n- 
k=l 

This calculation checks both of the subproblems. That is, we 
know that 

ni 

13 = X/ ^k,ndk,n 
k=l 

m 

15 = X] '^k,ndk,n 
k=l 

are also impossible. 

In the case (C2), when the subproblems are solved several on the 
same run, the algorithm does not look at each subproblem sepa- 
rately and compute subset sums for them. Implicitly it solves all 
subproblems, i.e., the algorithm has proven that no subproblem 
has a solution unlike in Example 1 where the algorithm did not 
solve the subproblems at all. 

We are checking the subproblems by noticing that j„ belongs to 
an infinite set of odd numbers. The set of odd numbers that are 
smaller than an exponential upper bound is exponential. Thus, 
it is possible to check an exponential number of problems at the 
same time. The argument in the proof is not that the set of j- 
has an exponential size, but that it has an exponential range: it is 
going through all numbers up to some exponential upper bound. 
In this range we have also the even numbers. The numbers on a 
certain range do not have common properties, such as divisability 
by some number. 
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Let us see how Lemma 7 shows that (C2) cannot happen. It is 
because we can select freely from an exponential set ranging 
over all numbers with some exponential upper bound. If for each 
choice of the ni values the algorithm can check the subset 
sums on the same run, then the algorithm checks an exponential 
number of values j- on the same run, where j- can range over 
all numbers with an exponential upper bound and some lower 
bound. This is not possible for a polynomial- time algorithm. 

Example 3. Let j[ = 21 - ai = 21 - 9 = 12, j'^ = 21 - 02 = 
21 — 6 = 15. In this case all dk,m = 1, . . . ,n, cannot be even. 
There are two subproblems: 

ni 

12 = Ck,ndk,n 
k=l 

m 

15 = ^ Ck^ndk,n- 
k=l 

Again the algorithm must check both subproblems before con- 
cluding that 

10 ^ 

21 ~l~ 2 — C}^,ndk,n 
k=l 

does not have a solution. The second subproblem can be checked 
by divisability considerations, while the first must be checked in 
some other way, probably on another run of the algorithm. This 
the case (C3). As we can select the a,;, we can select such subprob- 
lems that they cannot be checked on the same run. It depends 
on the algorithm what problems must be solved in different runs 
and therefore we must look at the algorithm before selecting the 
au- 

4 On the inequality (2.9) 

In this section we try to establish the inequality (2.9) for any 
algorithm solving the knapsack problem. Let the algorithm be 
chosen. We selected a tuple for every jn and showed in 
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Lemma 7 that the median computation time for the set of i^ij„ 
is at least as high as the left hand side of (2.9). However, the set 
of depends on jn and we should get a constant n-tuple that 
can be compared to the worst in the median knapsack with the 
computation time /(n). 

Let this constant n-tuple be called K2. K2 has at most as long 
median computation time as the worst in the median tuple. What 
we have to show is that the set of K\ is not harder to solve for 
the algorithm than K2. In order to show this, we first define 
another set of n-tuples i^3j„ and show that i^ij„ is not harder 
than i^3j„, and then try to show that i^3j„ is not harder than 
K2. 

Definition 4. Construction of i^3j„. Let jn be given and let 
us define a n-tuple K^j^ by specifying the elements 



d2,k = dk,n {k = l, 



n 
2 



d2,k = ei (A; =2+1,...,—) (4.1) 
d2,k = 62 (/c = — + 1, . . . , n - 1) 

d2,n jn,h- 

We select two nonnegative integers < i = 1,2. The 

selected ei and 62 are so small that if c„ = the higher bits of jn 
are not matched because there is no carry. 

Remark 9. This n-tuple has a simple upper half tuple. The sum 
of the numbers dk^n, k < ^ is less than 2^+^ — 1. It is always 
necessary to set c„ = 1 and this satisfies the upper half bits of jn- 

Definition 5. Construction of K2. Let us define K2 as an n- 

tuple with elements (c/q,!? • • • ? c^o,n) where each c?o,fc < 

let n-tuple {di^n, ■ ■ ■ , d^^n) be the worst in the median tuple for 



|. We define 
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as the values of the elements of for /c = 1, . . . , ni. The numbers 
ei and 62 are as in K^j^ and we define the elements of K2 for 
A; = ni + 1 to /c = n as 



Thus, K2 has the same lower half tuple elements as i^3j„. 

Remark 10. In K^j^ our chosen algorithm fast finds a solution 
and stops. The tuple K2 can be split into two n-tuples: the lower 
half tuple with elements smaller than C and the upper half tuple 
that has the higher bit parts. In K2 the algorithm usually does 
not stop to a solution of the lower half tuple since the upper 
half tuple is usually not satisfied by Ck that satisfy the lower half 
knapsack. We construct another algorithm A2 that is similar to 
the chosen algorithm Al but such that it does not stop when it 
finds a solution. Otherwise it is similar to the chosen algorithm 
Al. Thus, for jn that yields no solution to K2 the new algorithm 
A2 works in the same way as the chosen algorithm. The cases 
when K2 has a solution are not computed to the median time 
of deciding if K2 has a solution since we count the median over 
unsuccesful cases of Al only. We compute the median time for 
A2 solving i^3j„ so that the median is taken over values of jn for 
which K2 does not have solution. 

Lemma 8. It is at most as fast to compute than to com- 
pute K^j^. 

Proof. We compare the algorithm A2 solving the set of K^j^ and 
the algorithm Al solving the set of i^ij„. In i^3j„ the indices 
k > ni yield ^2+^ values of j that could be made as subset sums 
from the the worst in the median ni-tuple in the indices k < n. 
As we can select ei and 62 from an exponential set of numbers, 
we may assume that the numbers j are sufficiently well randomly 



4 = Cdo^k + ei {k = - + 1 

3/?/ 

4 = Cdo^k + 62 (A; = — + 1, 



3n 

•••'T 



...,n-l) 



(4.3) 
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distributed over the possible range of the numbers j. Therefore 
over half of the values j are likely to be on the range (3.4) and 
about half of the values of j that do not give a solution in K2 
are likely to yield a longer computation time than /(ni) We can 
select the values ji- in i^i j„ to be the values that give the smallest 
possible computation time larger or equal to /(ni). The time to 
compute the sums of about i ('^+4)'^ values of j for i^3j„ that are 
above /(^i) is larger than the time to compute | values of 
for that are slighly above /(ni). Most of the numbers j in 
K^j^ will also require different runs of the algorithm as one run 
can only give an answer to a polynomial number of values j and 
the values j in K^j^ take values sufficiently randomly over an 
exponential range. 

Remark 11. The median computation time in (2.4) is calculated 
over the no instances only. Thus, yes instances are ignored. It is 
sufficient that there are at least some no instances so that (2.4) 
can be calculated. We give an argument that estimates the num- 
ber of solutions to the knapsack problem (j^, K2). The argument 
makes use of averages but it is quite sufficient for showing that 
there are some no instances for computation of (2.4) if the up- 
per bits of K2 are selected in a suitable way, indeed a random 
selection of these bits is likely to yield many no instances. Let us 
mention that Lemmas 9-13 are not used in the proof of P 7^ NP 
and the probabilistic nature of the proofs of these lemmas has no 
relevance to Theorems 1 and 2. 

Lemma 9. There are in average 2^ solutions possible choises 
of {ci, . . . , Cn) that give the same sum T,k=i Ckdo,k- 

Proof. The number of combinations of is 2^ and the sum 
T,k=ido,k is at most 2?. There are fewer combinations that yield 
very small or large sums and most sums are in the middle ranges. 

Lemma 11. We can select the numbers do^k in such a way 
that there are in average about 2 5 solutions possible choises of 
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(ci, . . . , c„J that give the same sum E^Li Ckdo^k- 

Proof. Most random selections of the numbers do^k give this re- 
sult. There are fewer combinations that yield very small or large 
sums and most sums are in the middle ranges. 

Lemma 12. The lower half tuple in the indices k = ni-\-l, . . . ,n 
has only possible values j. 

Proof. These numbers are 

n 

j = Yl Cfc(4 - Cdo^k) = hei + k2e2 (4.4) 

k=ni+l 

where < A^i < ^ and < A;2 < J - 1. 

Remark 12. The elements in the worst in the median tuple 
for ni satisfy dk,n < ^^^f^ because we only maximize over such 
elements. Also < Thus, there is no carry from the lower 

half tuple to the upper half tuple. 

Lemma 13. It is possible to compute the median (2.4) for K2. 

Proof. Let us assume that the values are fixed for the indices 
A; > ni + 1. This fixes some value j that must be obtained from 
the knapsack in the indices A; = 1, . . . , ni as the subset sum. By 
Lemma 12 there are only possible values j. The upper half 
tuple yields about 25 possible solutions for a given j in the in- 
dices A; = 1, . . . , ni by Lemma 11. The worst in the median tuple 
in the lower half tuple has | elements, thus 2^ possible numbers 
can be constructed as sums E^Li c^d'j^ in the lower half tuple. The 
set of the about 2 5 possible solutions of the upper half tuple for a 
randomly selected j is a small subset of all possible combinations 
of Ck in the lower half tuple in the indices k = 1, . . . ,ni. The 
probability that any of the possible solutions from the upper half 
tuple is a solution of the lower half tuple is only on the range 
of ^^^2-?. The events of selecting the upper half tuple, the 
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lower half tuple, and the value j can all be considered indepen- 
dent events. There are only a polynomial number of sums (4.4), 
thus when is selected, there are only a polynomial number of 
possible values for the lower half of j in (j, (d'l, . . . , d'^_^)) . For a 
randomly selected jn there are then only a polynomial number of 
Ck, k < ni, that satisfy the lower half bits of j„. The choise of Ck, 
k < ni, fixes the upper half of j. We are left with an upper half 
knapsack problem for the indices A; = ni + 1, . . . , n. In this knap- 
sack problem the elements have the size about 2"^^ and there are 
ni elements. Thus, for a randomly selected jn we expect about 
one solution. The solution is constrained by the demand that the 
lower half bits give j, i.e., not all combinations are possible. We 
conclude that we get at least some no instances for computation 
of (2.4) for some choice of (c?o,i, • • • , c?o,n)- 

Remark 13. In order to solve the subset sum problem for K2 the 
algorithm should find a common solution to two knapsacks, i.e., 
both the upper bits and the lower bits knapsacks in K2 must be 
solved with the same numbers (ci, . . . , c„). As we may choose any 
difficult knapsack (c/q,!, • • • 5 do,n) to the upper bits of it seems 
to be a much harder problem to solve K2 than to solve i^3j„ 
where the upper bits are trivial to satisfy. However, it might the- 
oretically be faster to solve K2 since the algorithm does not need 
to check that there are no solutions to the lower half tuple, only 
that none of the solutions to the upper half tuple are solutions 
to the lower half tuple. This is as far as we get in trying to prove 
the inequality (2.9) directly. We cannot show that there exists 
such upper half tuples (c/q,!, • • • 5<^o,n) that guarantee that K2 is 
slower to solve than the set of i^3j„ even though it seems obvious 
that this claim is true. In Section 5 we make a different argument 
by considering another algorithm that is faster than the original 
algorithm. 

Discussion of the method of Section 4. 

Figure 1 shows the main idea of the proof. The set i^ij„ has the 
worst in the median ni-tuple in the left side and the right side 
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has numbers from which it is necessary to select exactly one in 
order to satisfy the high bits of jn- This yields rii separate subset 
sum problems and we get the computation time corresponding 
to the left side of (2.9). The set i^3j„ has only one element which 
has high order bits and it must always be selected in order to 
satisfy the high bits of jn- There is the same worst in the median 
ni-tuple and the remaining ni — 1 elements can be assigned in any 
way yielding of the order knapsack problems. Therefore, it is 
easy to believe that i^ij„ is easier to solve than i^3j„ for almost 
any jn- The n-tuple K2 has some difficult upper half knapsack 
problem which has to be satistifed with the same values Ck as the 
lower half knapsack. It certainly seems that K2 is more difficult 
to solve than i^3j„. Finally, the inequality from K2 to the worst 
in the median n-tuple is obtained directly by the definition of 
what the worst means. 
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Fig. 1. The idea of the proof. 



We have not specified a generic computational model for the algo- 
rithm and are presenting a proof using a lower bound. It is some- 
times stated that specifying what computation model is used is 
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important for a lower bound proof. This may be the case for good 
lower bounds but a weak lower bound does not require knowing 
the algorithm. Lower bounds of functions are taken in several 
branches of mathematics and no computation model in the sense 
of the theory of computational complexity is usually specified. 
Thus, a computation model is not necessary for proofs of lower 
bounds. The presented proof is a simple case of approximating 
functions as is done in many fields of mathematics. As an ex- 
ample showing that there is no need to have a computational 
model for the algorithm for deriving a weak lower bound let us 
consider a hypothetical algorithm that has the property that it 
always wins a game of solitaire provided that there is a way to 
win the game. There is a lower bound to the running time of 
such an algorithm derived directly from the problem it solves. In 
order to win a game of solitaire, all cards must be turned face 
up in the end. We know how many cards are face down in the 
beginning. Thus, the time to flip the cards that are face down 
is a lower bound to the running time of any algorithm with the 
stated property. No computational model needs to be specified 
for this lower bound. The lower bound is the property determined 
by the problem posed to the algorithm. In the presented proof 
the situation is similar. We present an unknown algorithm with 
a carefully selected set of subset sum problems. Any algorithm 
solving the selected problems must do certain things determined 
by the problem, and the weak lower bound is a result of these 
necessary things. A computational model, like the Turing model, 
describes how an algorithm solves the problem. The proof pre- 
sented in this article only looks at what the algorithm necessarily 
must do in order to solve the presented problems, not how it does 
it. Tracking how an unknown algorithm might do things is diffi- 
cult and therefore no computational model is used in the proof. 
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5 Changing the algorithm 



We will construct another algorithm A3 that is at least as fast 
as the original algorithm Al and for which the inequality (2.9) 
can be established. It follows that A3 is not a polynomial-time 
algorithm. Thus, Al is not a polynomial-time algorithm either. 

We use the following notations. Let Ka = {di, . . . , dk) and = 
{d[, . . . , d'^ be /c-tuples. Then we write 

{Ka\Ki,) = {di, . . . , 4, c^i, . . . , 4) 

and 

{Ka + 2^ Kb) = {di + 2"4, . . . , 4 + 2"4). 

Let Wn be the worst in the median tuple for n (thus, it is an 
n-tuple). Let 

An = {2Un -i- An\W^^) (5.1) 

define recursively An starting from some large n and An that the 
algorithm Al finds difficult for subset sum problems. Here we let 
n be divisible by 4. Each number in Wn has bit length less than 
n and the numbers in An have bit lengths less than ni = |. 

Definition 6. The algorithm A3. Let us construct the new al- 
gorithm A3. For each n let us select a fixed n-tuple ((io,i5 • • • , <^o,n) 
using the recursive definition (5.1) and let A3 work as follows. 
It replaces jn by j„ + 2'^C~^jn.h and it replaces each dk^n by 
dk^n + 2"'do^k- Then A3 either solves the extended knapsack with 
Al, or solves the extended knapsack by ignoring the bits above 
2"~^ (i.e., solves the original knapsack) with Al, whichever is 
faster. This means that A3 makes a non-deterministic decision, 
but it does not need to be a deterministic algorithm. We only 
want it to be at least as fast as Al. The time taken by extending 
knapsacks and deciding which way to solve the knapsack is not 
counted into the median computation time for A3. 

Remark 14. Because of the Definition 6 the algorithm A3 is 
never slower than Al. If there is a solution to the original knap- 
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sack, there may not be a solution to the extended knapsack since 
no solution to the original knapsack needs to satisfy the new 
high bits. This means that A3 may make errors if there exists 
a solution for the knapsack problem (ji„,i^2)- This error is not 
important because of the following reason. The algorithm com- 
puting the median does not check if there is a solution from the 
(possibly wrong) answers given by the tested algorithm but it 
makes an exhaustive search and it does know if there is a so- 
lution to the knapsack. The algorithm computing the median is 
anyway an exponential-time algorithm and there is no need to 
worry about its slowness. Therefore only such values jn that do 
not have a solution in the original knapsack are counted in the 
median. The erroneously solved cases are not included. The dif- 
ference between the original algorithm Al solving extended knap- 
sacks and the new algorithm A3 solving the original knapsacks 
is in the search space for jn- When we compute the median for 
A3, the number jn loops over values where Al cannot find a solu- 
tion for the original knapsacks. A3 can search through a smaller 
space of possible combination candidates by using the extended 
knapsack and therefore it can be faster than Al in solving the 
original knapsack problems. 



Remark 15. We have defined that A3 solves problems by first 
expanding the knapsack problems, thus we know something of 
A3. Lemma 14 is given only for showing how A3 solves knapsacks 
by extending them. It is not used in the proof of P ^ NP. The 
conclusion in Lemma 14 is that for A3 it is more difficult to 
solve that to solve a set of knapsack problems (j-, 

However, the sense how this is more difficult is only that it is 
more difficult to solve two knapsack problems than to solve one. 
This does not say anything of the time needed. A3 may have 
some clever way to solve two problems faster than one. Our main 
interest is in Lemma 15. A similar calculation as in Lemma 14 
shows in Lemma 15 that for A3 it is at least as slow to compute 
the median for K2 as for the set K^j^. 
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Lemma 14. Let the algorithm be A3. It is more difficult to 
conclude that Ki has no solutions than it is to conclude that 
ni values of j[ do not solve (jl, WnJ- 

Proof. The algorithm A3 solves (jVn, W^m) by expanding it into 

(jn, + ^'IC'-'jn^h, Wn, + 2t^,J (5.2) 

where jn^^h = C"[jniC"~^J, C is as in (3.5). Let Bi be defined by 
^i,jn = 0^ni\Bi). The algorithm A3 solves {jn, KijJ by expand- 
ing it as 

(i„ + 2"C-Vn,/.,i^i,i„ + 2M,) 

This implies solving 

if, + 2-i + 2"+^i, Wn, + 2M,, + 2-+^ A„J (5.3) 

for some j- and j. It is also necessary to solve 

[jn + 2-C-%h - i- - 2^i - 2"+^i, 5i + 2-l^,J. (5.4) 

Solving (5.3) is the same as solving 

(j,' + 2ij, iy„, + 2U,J. (5.5) 

Solving (5.4) is the same as solving the following two knapsack 
problems 

iC-%h - J - 2tj, WnJ , Un - B,). (5.6) 

Bi allows m = I solutions jf-. Let us compare (5.2) and (5.5). In 
order to show that there are no solutions to (5.2) it is enough to 
check one value j = C'~^jni,h- In order to show that there are no 
solutions to (5.5) it is necessary to check all values j that satisfy 
the left side knapsack in (5.6). It follows that it is more difficult 
to conclude that Ki j has no solutions than it is to conclude that 
ni values of j- do not solve (ji-, W^J- 

Lemma 15. Let the algorithm be A3. It is at least as slow to 
compute K2 than it is to compute K^j^. 
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Proof. An easy verbal explanation of the following proof is that 
K^j^ is solved by A3 by first extending it with ((io,i, . . . , c^o,n)- We 
define K2 in such a way that the upper half tuple {do^i, . . . , (io,n) 
in (4.2) is the same ((io,!, • • • ,do,n) in Definition 6. When K2 is 
solved by extending it with (c/q,!? ■ ■ ■ , <^o,n) there are two identical 
extensions in K2. The complexity of solving two identical exten- 
sions is the same as solving one extension, especially for A3 that 
selects the faster alternative of either first extending and solving 
with Al, or not extending and solving directly with Al. Thus, K2 
is essentially as difficult to solve for the new algorithm as i^3j„. 
We give this argument more precisely in what follows. Let us de- 
fine B2 by K^j^ = {Wn^, B2). By the proof of Lemma 8 the tuple 
i^3j„ yields about rn values of j that may give solutions to 

0? W^nJ- The median computation time for K2 is computed from 
knapsacks K2) that are expanded by A3 as 

[jn + rC-'3n,h. K2 + (5.6) 

= {jn + 2^C-^3n^h. i^3,,„ - Jn,/^(0, . . . , 1) + + 2M,) 

Here we have inserted the definition of K2 from (4.2). The median 
over the set i^3,j„ is computed from 

Un + rC-'jn,h. K,,^ + 2M„) (5.7) 

and there is the additional condition that c„ = 1. We notice 
that in K2 there are identical repeated rows but otherwise the 
problems (5.6) and (5.7) are the same. Thus, the median over K2 
is at least as slow to compute as the median over the set i^3j„. 
It may be a bit easier to solve K^j^ because the condition Cn = 1 
restricts possibilities. 

Lemma 16. Let the algorithm be A3. It is possible to compute 
the median (2.4) for K2. 

Proof. Let us approximate the number of jn that do not give 
solutions to K2. Let us expand An in (5.6) by using the definition 
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(5.1). The knapsack in indices 1, . . . ,ni has the expression 

At the indices ni + 1, . . . , n the solution must solve {B2, jn—j'i) in 
the low bits. Let there be m' values of j- that solve {B2,jn — j'i)- 
B2 has only two different values of in (4.1). Thus, many combi- 
nations of c„i+i, ■ ■ ■ ,Cn yield the same value j-. Let us divide the 
combinations of c„^+i, . . . , c„ into m sets in such a way that each 
combination in the set yields the same value j •. Each combination 
Cn,+i, ■ ■ ■ ,Cn determines a value jo that the upper bits knapsack 
{Wni,jo) in the indices ni + 1, . . . , n should satisfy. About half of 
the values jo yield a solution to (jo, Wn,)- At the indices 1, . . . , ni 
there is the knapsack (jf-, WnJ- It has a solution for about half of 
the values j^. Each solution determines a combination ci, . . . , 
which determines a value j to the upper bits in indices 1, . . . , ni. 
The numbers jo and j must satisfy the equality 

i + 2^i+jo = (l + 2?)C-Vn,/.. (5.8) 

Let us select j-. In about half of the cases there is no such j 
since {j[^Kn,) does not have a solution. (If j exists, it is usually 
unique because the 2? combinations of Ck in Wn, are mapped to 
the range 2?+^ and the most probable cases are that there is no 
solution or one solution.) When j is obtained, the equation (5.8) 
determines the value jo- This jo must belong the the set of possi- 
ble values for jo to this j[. There are m sets. Though the sets do 
not have the same sizes, we can in the average approximate that 
the probability of the jo belonging to the correct set is about ^. 
If it belongs to the correct set, it solves (jo, W^m) with probablity 
0.5. Then it is a solution to K2. Thus, for selected j[ there is 
a solution with the probability that j exists (about 0.5) times 
jo is in the set (about ^) times jo solves (jo,M^ni) (about 0.5). 
There are m values j-. This yields the approximation to finding 
a solution to K2 as Tn—^ = 0.25. This rough approximation 
shows that at least half of the values jn do not give a solution. In 
(2.4) the exact proportion of successful and unsuccesful values of 
jn is not important as long as there are at least some values to 
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compute the median. The argument shows that the proportion 
of unsuccessful values of jn does not approach zero. 

Lemma 17. Let the algorithm be A3. The algorithm cannot 
stop because of finding a solution because for every jn none of the 
I values of j[ solve the knapsack problem. Every value j- gives at 
least as long computation as the median computation time f{ni). 

Proof. The proof for Lemma 5 does not need changes: there are 
no solutions for (jf-, Wm) for any of the | values from Bi and the 
selection of ji- can be made to give at least as long computation 
as the median computation time /(ni). 

Lemma 18. Let the algorithm be A3. There is no way to discard 
any values j[ without checking if they solve the subknapsack from 
k = 1 to k = ni. Any case of using the values of dk,n in order to 
get the result is considered checking. 

Proof. The proof of Lemma 6 uses the property that has 
a simple upper half tuple. It may appear that this property has 
changed since the algorithm extends the knapsacks by adding a 
nontrivial knapsack to high bits. However, the knapsack Ki,j,J 
has not changed. The proof of Lemma 6 is true for any (arbitrary) 
algorithm solving knapsack problems. The algorithm can solve in 
any way, e.g. by adding an extension. What is important is that 
the algorithm needs to check if the value j- yields a solution. If 
there exists at least one solution to the upper knapsack then the 
new algorithm cannot decide that there is no solution without 
checking the value of ji- in some way. This condition is valid since 
there are many possible solutions to the upper half knapsack 
problem (j, A^J for a selected jn in almost all cases: 2" combina- 
tions of Cfc are mapped to 2? values of j. If the upper half of jn is 
very large or very small there are only few possible combinations 
to the upper half knapsack and A3 may conclude fast that there 
are no solutions because the upper half bits do not match. There 
cases are so rare (necessarily a polynomial size set) that they do 
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not affect the median. Thus, the proof of Lemma 6 also proves 
Lemma 18. 

Lemma 19. Let the algorithm be A3. Several values o{j[ cannot 
he evaluated on the same run. The median computation time of 
Ki j is at least 

/i(ni) + ni/2(ni) 

where f{n) = + j2{n) and fi{n) is a polynomial function of 
n. 

Proof: If the upper half knapsack can be solved by polynomially 
many possible combinations for ci, . . . ,c„^ only, then the lower 
half knapsack (j, W^) can be solved by checking these combina- 
tions. Then there is no need to solve the knapsack (j', VK^J in the 
hard way. As long as the search space of possible combinations of 
ci, . . . , gives an exponential number of values j that are pos- 
sible in the upper bits but do not solve the lower bit knapsack in 
indices 1, . . . ,?t,i, the argument in the proof of Lemma 7 holds. 
For the choise (5.1) there exists an exponential number of combi- 
nations ci, . . . , that solve the upper half knapsack (j, An^) for 
any fixed j. It can be assumed that a large proportion of these 
choises give j that does not yield a solution to (j, Wm)^ Thus, the 
proof of Lemma 7 also proves Lemma 19. 

Lemma 20. Let the algorithm he A3. It is at most as fast to 
compute Kij^ than to compute K^j^. 

Proof. The proof of Lemma 8 also proves Lemma 20. 

Lemma 21. Let the algorithm he A3. The set of selected tuples 
Kij^ gives a longer median computation time than the worst in 
the median tuple for n. 

Proof: By Lemma 20 the set i^3,j„ is slower to solve than the set 
Kij^ when the median is taken over the set of jn- By Lemma 15 
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the tuple K2 is not faster to solve than the set of i^3j„ when the 
median is computed over the set jVi such that there is no solution 
for K2. As K2 is a fixed n-tuple and does not depend on jn it 
follows from the definition of the worst in the median tuple that 
K2 has at most as long median computation time as the worst 
in the median tuple for n, i.e., f{n) for A3. We conclude that by 
selecting and in a suitable way, the median time of deciding 
if i^ij„ has a solution is smaller than /(n), where the median is 
taken over the set of values of jn as in (2.5)-(2.6). 



Lemma 22. Algorithm A3 is not a polynomial time algorithm 



Proof: We have constructed in Sections 3 and 4 a sequence of jn 
growing as 2" since it is sufficient to find one case of a knapsack 
sequence that cannot be solved in polynomial time. By Lemmas 
19 and 21 the inequality of Lemma 4 holds. Consequetly, f{n) is 
not bounded by a polynomial function of n and thus the algo- 
rithm A3 is not a polynomial-time algorithm. 

Discussion of the method of Section 5 

Figure 2 demonstrates the recursive definition (5.1) and extension 
of knapsacks in Definition 6. 




A. structure 





Worst n/2 







Fig. 2. Extending knapsacks in A3. 
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The problem in Section 4 was comparing the median computa- 
tion times of K2 and the set i^3j„. It seems natural that K2 is 
more difficult to solve of these two as it has a difficult upper half 
knapsack problem that must be solved with the same set of Ck 
as the lower half knapsack problem, while the set K^j^ only has 
the same lower half knapsack problem and a trivial upper half 
knapsack problem. However, this we could not show since the un- 
known algorithm might in some way use the possible solutions to 
the upper half knapsack problem and in such a way check fewer 
cases. This is why we modified the original algorithm Al into 
A3, which always can take advantage of the upper half knapsack 
solutions. This is done by always extending the knapsack prob- 
lem before solving it with Al as one alternative, or solving it 
directly with Al, whichever is the fastest. Extending the upper 
half knapsack with a difficult knapsack only reduces the possi- 
ble solution space. Instead of having all possible combinations 
of Cfc, the algorithm A3 might only check those combinations of 
Ck which solve the upper half knapsack. For the inequality (2.9) 
this only means that the possible cases are an exponential size 
sample of the original set of the original possible cases. As the 
set has an exponential size and it can be freely selected, we still 
have a large enough space of possible values jn for the lemmas of 
Section 3. The difficult part to prove in Section 4 becomes simple 
as both A'3j„ and K2 have the same upper half knapsack in the 
case when A3 extends knapsacks. 

6 A proof that P does not equal NP 

Theorem 1. Let an algorithm for the knapsack problem be 
selected. There exist numbers B,a ^ JR, B > 1, a > and a 
sequence 

,n) • • • ) ^n,n)))n>l 

of knapsacks satisfying 

\og2 Jn < Bn"", log2 4,n < Bn"" , {l<k< n), (n > 1) 



37 



such that the algorithm cannot determine in polynomial time if 
there exist binary numbers Ck,n, 1 < k < n, satisfying 

n 
k=l 

Proof. The idea in this proof is to compare the computation time 
of the worst (in some sense) knapsack of size n to the computation 
time of (in the same sense) worst knapsack of |. This computa- 
tion time was defined in 2.7 and denoted by f{n). In Sections 
3 and 4 we showed several parts of inequality (2.9) but did not 
manage to show the last part for (2.9). However, in Section 5 we 
created another algorithm A3 and showed in Lemma 22 that is 
is not a polynomial time algorithm. As A3 is never slower than 
the original selected algorithm, the original algorithm is not a 
polynomial time algorithm. 

Theorem 2. P does not equal NP. 

Proof. The knapsack problem is well known to be in NP. 
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